MTWatch: A Tool for the Analysis of Noisy Parallel Data
نویسندگان
چکیده
State-of-the-art statistical machine translation (SMT) technique requires a good quality parallel data to build a translation model. The availability of large parallel corpora has rapidly increased over the past decade. However, often these newly developed parallel data contains contain significant noise. In this paper, we describe our approach for classifying good quality parallel sentence pairs from noisy parallel data. We use 10 different features within a Support Vector Machine (SVM)-based model for our classification task. We report a reasonably good classification accuracy and its positive effect on overall MT accuracy.
منابع مشابه
A method to solve the problem of missing data, outlier data and noisy data in order to improve the performance of human and information interaction
Abstract Purpose: Errors in data collection and failure to pay attention to data that are noisy in the collection process for any reason cause problems in data-based analysis and, as a result, wrong decision-making. Therefore, solving the problem of missing or noisy data before processing and analysis is of vital importance in analytical systems. The purpose of this paper is to provide a metho...
متن کاملImproving the Performance of ICA Algorithm for fMRI Simulated Data Analysis Using Temporal and Spatial Filters in the Preprocessing Phase
Introduction: The accuracy of analyzing Functional MRI (fMRI) data is usually decreases in the presence of noise and artifact sources. A common solution in for analyzing fMRI data having high noise is to use suitable preprocessing methods with the aim of data denoising. Some effects of preprocessing methods on the parametric methods such as general linear model (GLM) have previously been evalua...
متن کاملOutput-only Modal Analysis of a Beam Via Frequency Domain Decomposition Method Using Noisy Data
The output data from a structure is the building block for output-only modal analysis. The structure response in the output data, however, is usually contaminated with noise. Naturally, the success of output-only methods in determining the modal parameters of a structure depends on noise level. In this paper, the possibility and accuracy of identifying the modal parameters of a simply supported...
متن کاملA Novel Method for Detection of Epilepsy in Short and Noisy EEG Signals Using Ordinal Pattern Analysis
Introduction: In this paper, a novel complexity measure is proposed to detect dynamical changes in nonlinear systems using ordinal pattern analysis of time series data taken from the system. Epilepsy is considered as a dynamical change in nonlinear and complex brain system. The ability of the proposed measure for characterizing the normal and epileptic EEG signals when the signal is short or is...
متن کاملInterval Analysis of Controllable Workspace for Cable Robots
Workspace analysis is one of the most important issues in the robotic parallel manipulator design. However, the unidirectional constraint imposed by cables causes this analysis more challenging in the cabledriven redundant parallel manipulators. Controllable workspace is one of the general workspace in the cabledriven redundant parallel manipulators due to the dependency on geometry parameter...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014